Incorrect data in the widely used Inside Airbnb dataset
نویسندگان
چکیده
Several recently published papers in Decision Support Systems discussed issues related to data quality Information research. In this short research note, I build on the work introduced these and document two discovered a large open dataset commonly used Inside Airbnb (IA) collects from places reviews as posted by users of Airbnb.com. Visitors can effortlessly download collected IA for several locations around globe. While is widely academic research, no thorough investigation its validity has been conducted. This note examines explains an issue incorrect added dataset. Findings suggest that be attributed systemic errors collection process. The results use unverified datasets problematic, although discoveries presented may not significant enough challenge all Additionally, findings indicate happens because new feature implemented Airbnb. Thus, unless changes are made, it likely consequences will only become more severe. Finally, explores why reproducibility problem when different releases compared.
منابع مشابه
The Prediction of Booking Destination On Airbnb Dataset
This report is about analysis of the Airbnb dataset and the model we built to do the prediction task on the dataset. The dataset comes from an ongoing kaggle competition supported by Airbnb. We first did some comprehensive analysis on the dataset, explored most features and collected all features we thought was useful. Then we described and interpreted the prediction task and the evaluation met...
متن کاملTransport-domain applications of widely used data sources in the smart transportation: A survey
The rapid growth of population and the permanent increase in the number of vehicles engender several issues in transportation systems, which in turn call for an intelligent and cost-effective approach to resolve the problems in an efficient manner. Smart transportation is a framework that leverages the power of Information and Communication Technology for acquisition, management, and mining of ...
متن کاملIdentification of the Features of E-reader Applications and Evaluation of Widely Used Iranian Applications
Purpose: The purpose of this study is to identify the features of e-reader applications through a systematic review of texts, and also to evaluate four widely used Iranian e-reader applications (Fidibo, Taghche, Ketabrah, ketab sabz) in terms of identified features. Method: The present redearch is an applied study in terms of purpose that was conducted on the basis of a systematic review frame...
متن کاملHigh performance of the support vector machine in classifying hyperspectral data using a limited dataset
To prospect mineral deposits at regional scale, recognition and classification of hydrothermal alteration zones using remote sensing data is a popular strategy. Due to the large number of spectral bands, classification of the hyperspectral data may be negatively affected by the Hughes phenomenon. A practical way to handle the Hughes problem is preparing a lot of training samples until the size ...
متن کاملHow R helps Airbnb make the most of its data
At Airbnb, R has been amongst the most popular tools for doing data science in many different contexts, including generating product insights, interpreting experiments, and building predictive models. Airbnb supports R usage by creating internal R tools and by creating a community of R users. At the end of the post, the authors provide some specific advice for practitioners who wish to incorpor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Decision Support Systems
سال: 2021
ISSN: ['1873-5797', '0167-9236']
DOI: https://doi.org/10.1016/j.dss.2020.113453